Welcome![Sign In][Sign Up]
Location:
Search - data corpus mining

Search list

[Windows DevelopSogouC.mini.20061127

Description: 文本分类现成语料库,是SogouC.mini.20061127.zip格式的,可用于web数据挖掘-Off-the-shelf text classification corpus is SogouC.mini.20061127.zip format, can be used for web data mining
Platform: | Size: 169984 | Author: zimeng | Hits:

[AlgorithmApriori_DIC

Description: 数据挖掘经典算法 Apriori and DIC 同时有 Brin关于DIC的论文 和训练语料-Classical data mining algorithm Apriori and DIC at the same time on the DIC thesis Brin and training corpus
Platform: | Size: 115712 | Author: luowei | Hits:

[Windows Develop1

Description: 基于WEKA平台的文本聚类研究与实现 文本聚类是文本挖掘领域的一个重要研究分支,是聚类方法在文本处理领域的应用。本文对基于空间向量模型的文本聚类过程做了较深入的讨论和总结,利用文本语料库,基于数据挖掘工具研究并实现了文本聚类的过程。本文首先给出了文本聚类的思想和过程,回顾了文本聚类领域的已有成果,列举了文本聚类领域在特征表示、特征提取等方面的基础研究工作。另外,本文回顾了现有的文本聚类算法,以及常用的文本聚类效果评价指标。在研究了已有成果的基础上,本文利用20 Newsgroup文本语料库,针对向量空间表示模型,在开源的数据挖掘平台WEKA上实现了文本预处理和k-means聚类算法,并根据实际聚类效果,就文本表示、特征选择、特征降维、等方面提出优化方案。-Text clustering is an important field of text mining research branch, is the clustering in the field of text processing applications. In this paper, based on vector space model for text clustering process to do a more in-depth discussion and summary, the use of the text corpus, based on data mining tools to study and realize the document clustering process. This paper shows the ideas and text clustering process, reviewed the existing text clustering results of the field, citing the field of document clustering in the feature representation, feature extraction and other aspects of basic research. In addition, the paper reviews the existing text clustering algorithm, as well as common text clustering validity. In the study has been based on the results, we use 20 Newsgroup corpus, for the vector space representation model, in the WEKA open source data mining platform to achieve a text preprocessing and k-means clustering algorithm, and according to the actual clustering effect to the tex
Platform: | Size: 1022976 | Author: yueyue | Hits:

[OtherSogouC.reduced.20061102.tar

Description: 搜狗语料库,用于文本分类 数据挖掘 机器学习里面非常有用-Sogou corpus for text classification data mining machine learning which is very useful
Platform: | Size: 24369152 | Author: 张杰 | Hits:

[DataMininglearning-data-mining-with-python

Description: 《python数据挖掘入门与实践》随书源代码,Chapter1-Chapter12.使用ipython notebook运行,包括社会媒体挖掘,作者归属,新闻语料分析,大数据处理等应用实例。-Python data mining entry and practice with the book source code, using Chapter1-Chapter12. IPython notebook operation, including social media mining, author attribution, news corpus analysis, big data processing and application.
Platform: | Size: 12564480 | Author: 启民 | Hits:

CodeBus www.codebus.net